Goto

Collaborating Authors

 high-quality explanation


Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) excel at generating human-like text, aligning their outputs with complex, qualitative goals like pedagogical soundness remains a significant challenge. Standard reinforcement learning techniques often rely on slow and expensive LLM-as-a-judge evaluations or on brittle, keyword-based metrics like ROUGE, which fail to capture the semantic essence of a high-quality explanation. In this work, we introduce a novel approach to reward shaping within the Group Relative Policy Optimisation (GRPO) framework. Our central contribution is the use of a small, efficient encoder-only transformer as a semantic reward model. This model provides a dense, semantically rich reward signal based on the cosine similarity between a generated explanation and a ground-truth reference, guiding the policy towards explanations that are not just factually correct but also structurally and conceptually aligned with expert reasoning. We apply this method to the task of training a model for the Italian medical-school entrance examinations, following standard domain-adaptive continued pre-training (CPT) and supervised fine-tuning (SFT). Our results demonstrate that GRPO with our proposed semantic reward significantly improves explanation faithfulness and clarity over a strong SFT baseline, showcasing the power of using lightweight encoder models for nuanced reward shaping in complex generation tasks


Selective Explanations

arXiv.org Artificial Intelligence

Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features. These methods can be computationally expensive for large ML models. To address this challenge, there has been increasing efforts to develop amortized explainers, where a machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. In this paper, we propose selective explanations, a novel feature attribution method that (i) detects when amortized explainers generate low-quality explanations and (ii) improves these explanations using a technique called explanations with initial guess. Our selective explanation method allows practitioners to specify the fraction of samples that receive explanations with initial guess, offering a principled way to bridge the gap between amortized explainers and their high-quality counterparts.


Chang

AAAI Conferences

Recommender systems face several challenges, e.g., recommending novel and diverse items and generating helpful explanations. Where algorithms struggle, people may excel. We therefore designed CrowdLens to explore different workflows for incorporating people into the recommendation process. We did an online experiment, finding that: compared to a state-of-the-art algorithm, crowdsourcing workflows produced more diverse and novel recommendations favored by human judges;some crowdworkers produced high-quality explanations for their recommendations, and we created an accurate model for identifying high-quality explanations;volunteers from an online community generally performed better than paid crowdworkers, but appropriate algorithmic support erased this gap. We conclude by reflecting on lessons of our work for those considering a crowdsourcing approach and identifying several fundamental issues for future work.